The Overfitting in Probabilistic Latent Semantic Models
نویسندگان
چکیده
The symmetric EM algorithm is proposed for probabilistic latent semantic analysis in collaborative filtering. The algorithm allows to reveal the latent interest profiles of both users and items, then to easily construct high-quality similarity measures of all required types: user–user, item–item, and item–user. The advantage of the proposed approach is that different profiles are consistent to each other due to symmetry of the algorithm. To estimate the quality of profiles and similarity measures empirically we use a sample of labeled items and the kNN classifier. Experiment show that the excessive optimization is redundant and can lead to overfitting.
منابع مشابه
Adjusting Mixture Weights of Gaussian Mixture Model via Regularized Probabilistic Latent Semantic Analysis
Mixture models, such as Gaussian Mixture Model, have been widely used in many applications for modeling data. Gaussian mixture model (GMM) assumes that data points are generated from a set of Gaussian models with the same set of mixture weights. A natural extension of GMM is the probabilistic latent semantic analysis (PLSA) model, which assigns different mixture weights for each data point. Thu...
متن کاملRandomized Probabilistic Latent Semantic Analysis for Scene Recognition
The concept of probabilistic Latent Semantic Analysis (pLSA) has gained much interest as a tool for feature transformation in image categorization and scene recognition scenarios. However, a major issue of this technique is overfitting. Therefore, we propose to use an ensemble of pLSA models which are trained using random fractions of the training data. We analyze empirically the influence of t...
متن کاملA Descriptive Framework for the Field of Data Mining and Knowledge Discovery
s of forty-nine regular papers from PAKDD 2005 [Ho et al. 2005], which were not used in the framework building process, were collected and analyzed to see if they fit in the categories identified by grounded theory. The abstract of each article was analyzed to identify the primary objective(s) the author(s) are addressing. Take the article “Adjusting Mixture Weights of Gaussian Mixture Model vi...
متن کاملA hybrid PLSA approach for warmer cold start in folksonomy recommendation
We investigate the problem of item recommendation during the first months of the collaborative tagging community Ci teULike. CiteULike is a so-called folksonomy where users have the possibility to organize publications through anno tations tags. Making reliable recommendations during the initial phase of a folksonomy is a difficult task, since infor mation about user preferences is meager. I...
متن کاملA New Indexing Method to Optimize the Performance for Online Image Retrieval System using ABIR Method
This paper shows brief description about the Markovian Semantic Indexing approach. The existing system uses the Latent Semantic indexing or Probabilistic Latent Semantic Index model. With this LSI having problems like when using large scale collection of images causes in low performance and speed levels and the PLSI approach also has problems like it is incomplete since provide no probabilistic...
متن کامل